A Corpus and Model Integrating Multiword Expressions and Supersenses

نویسندگان

  • Nathan Schneider
  • Noah A. Smith
چکیده

This paper introduces a task of identifying and semantically classifying lexical expressions in running text. We investigate the online reviews genre, adding semantic supersense annotations to a 55,000 word English corpus that was previously annotated for multiword expressions. The noun and verb supersenses apply to full lexical expressions, whether singleor multiword. We then present a sequence tagging model that jointly infers lexical expressions and their supersenses. Results show that even with our relatively small training corpus in a noisy domain, the joint task can be performed to attain 70% class labeling F1.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Can Recognising Multiword Expressions Improve Shallow Parsing?

There is significant evidence in the literature that integrating knowledge about multiword expressions can improve shallow parsing accuracy. We present an experimental study to quantify this improvement, focusing on compound nominals, proper names and adjectivenoun constructions. The evaluation set of multiword expressions is derived from WordNet and the textual data are downloaded from the web...

متن کامل

Building an Arabic Multiword Expressions RepositoryBuilding an Arabic Multiword Expressions RepositoryBuilding an Arabic Multiword Expressions RepositoryBuilding an Arabic Multiword Expressions RepositoryBulding an Arabic Multiword Expressions Repository

We introduce a list of Arabic multiword expressions (MWE) collected from various dictionaries. The MWEs are grouped based on their syntactic type. Every constituent word in the expressions is manually annotated with its full context-sensitive morphological analysis. Some of the expressions contain semantic variables as place holders for words that play the same semantic role. In addition, we ha...

متن کامل

UW-CSE at SemEval-2016 Task 10: Detecting Multiword Expressions and Supersenses using Double-Chained Conditional Random Fields

We describe our entry to SemEval 2016 Task 10: Detecting Minimal Semantic Units and their Meanings. Our approach uses a discriminative first-order sequence model similar to Schneider and Smith (2015). The chief novelty in our approach is a factorization of the labels into multiword expression and supersense labels, and restricting first-order dependencies within these two parts. Our submitted m...

متن کامل

UW-CSE: Detecting Multiword Expressions and Supersenses using Double-Chained Conditional Random Fields

We describe our entry to SemEval 2016 Task 10: Detecting Minimal Semantic Units and their Meanings. Our approach uses a discriminative first-order sequence model similar to Schneider and Smith (2015). The chief novelty in our approach is a factorization of the labels into multiword expression and supersense labels, and restricting first-order dependencies within these two parts. Our submitted m...

متن کامل

SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM)

This task combines the labeling of multiword expressions and supersenses (coarse-grained classes) in an explicit, yet broad-coverage paradigm for lexical semantics. Nine systems participated; the best scored 57.7% F1 in a multi-domain evaluation setting, indicating that the task remains largely unresolved. An error analysis reveals that a large number of instances in the data set are either har...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015